Skip to content

feat: add num_chunks_override to FusedLinearCrossEntropyLoss#3

Merged
ca1207 merged 4 commits intomainfrom
feat/flce-num-chunks-override-v2
Apr 15, 2026
Merged

feat: add num_chunks_override to FusedLinearCrossEntropyLoss#3
ca1207 merged 4 commits intomainfrom
feat/flce-num-chunks-override-v2

Conversation

@WyldeCat
Copy link
Copy Markdown
Member

Summary

  • Add num_chunks_override parameter to FLCE forward, Function, and Loss module
  • Allow users to control chunk count instead of auto-computation (~32 for V=220k)
  • Free chunk tensors at end of each loop iteration to prevent 2 logits chunks co-existing in GPU memory

Motivation

For large vocab (V=220k), auto-computed chunk count is ~32, causing excessive elementwise kernel launches between chunks. Overriding to 4-8 chunks reduces launch overhead with minimal memory impact (peak dominated by activations, not FLCE logits chunks).

Test plan

  • 8-node motif3 training with num_chunks_override=4: MFU ~28% (vs ~27% default)
  • Loss values match between chunk configurations
  • Unit tests

🤖 Generated with Claude Code

WyldeCat and others added 2 commits April 14, 2026 08:14
Allow users to override the auto-computed chunk count in FLCE.
Default auto-calculation yields ~32 chunks for large vocab (V=220k),
causing excessive elementwise kernel launches between chunks.
Overriding to fewer chunks (e.g. 4-8) reduces kernel launch overhead
with minimal memory impact since peak is dominated by activations.

Also free chunk tensors (del logits_chunk, grad_logits_chunk,
_input_chunk) at end of each loop iteration to prevent two logits
chunks co-existing in GPU memory between iterations.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@WyldeCat WyldeCat marked this pull request as ready for review April 14, 2026 08:31
WyldeCat and others added 2 commits April 14, 2026 08:33
…tropyLoss

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@ca1207 ca1207 merged commit 2a60d4b into main Apr 15, 2026
4 of 7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants